A robust RNN-based pre-classification for noisy Mandarin speech recognition
نویسندگان
چکیده
This paper addressed the problem of speech signal preclassification for robust noisy speech recognition. A novel RNN-based pre-classification scheme for noisy Mandarin speech recognition is proposed. The RNN, which is trained to be insensitive to noise-level variation, is employed to classify each input frame into the three broad classes of initial, final and pure-noise. An on-line noise tracking and estimation for noise model compensation is then performed. Besides, a broad-class likelihood compensation based on the RNN outputs is also performed to help the recognition. Experimental results showed that a significant improvement on syllable recognition rate has been achieved under non-stationary noise environment.
منابع مشابه
Improving the performance of MFCC for Persian robust speech recognition
The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...
متن کاملAn Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition
Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...
متن کاملAn RNN-Based Pre-classi cation Method for Fast Continuous Mandarin Speech Recognition
A novel RNN-based front-end pre-classiication scheme for fast continuous Mandarin speech recognition is proposed in this paper. First, an RNN is employed to discriminate each input frame for the three broad classes of initial, nal, and silence. A nite state machine (FSM) is then used to classify the input frame into four states including three stable states of Initial (I), Final (F), and Silenc...
متن کاملRobust SBR method for adverse Mandarin speech recognition - Electronics Letters
10 RRSBR An RNN-based robust signal bias removal (RRSBR) method is proposed for improving both the recognition performance and the computational efficiency of the SBR method for adverse Mandarin speech recognition. It differs from the SBR method in using three broadclass sub-codebooks to encode the feature vector of each frame and combining the three encoding residuals to form the frame-level s...
متن کاملRNN-based prosodic modeling for mandarin speech and its application to speech-to-text conversion
In this paper, a recurrent neural network (RNN) based prosodic modeling method for Mandarin speech-to-text conversion is proposed. The prosodic modeling is performed in the post-processing stage of acoustic decoding and aims at detecting word-boundary cues to assist in linguistic decoding. It employs a simple three-layer RNN to learn the relationship between input prosodic features, extracted f...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1997